Parallelized Stochastic Gradient Descent
نویسندگان
چکیده
With the increase in available data parallel machine learning has become an in-creasingly pressing problem. In this paper we present the first parallel stochasticgradient descent algorithm including a detailed analysis and experimental evi-dence. Unlike prior work on parallel optimization algorithms [5, 7] our variantcomes with parallel acceleration guarantees and it poses no overly tight latencyconstraints, which might only be available in the multicore setting. Our analy-sis introduces a novel proof technique — contractive mappings to quantify thespeed of convergence of parameter distributions to their asymptotic limits. As aside effect this answers the question of how quickly stochastic gradient descentalgorithms reach the asymptotically normal regime [1, 8].
منابع مشابه
Splash: User-friendly Programming Interface for Parallelizing Stochastic Algorithms
Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems. Splash consists of a programming interface and an execution engine. Using the programming interface, the user develops sequential stochastic algorithms without ...
متن کاملInvestigations on hessian-free optimization for cross-entropy training of deep neural networks
Context-dependent deep neural network HMMs have been shown to achieve recognition accuracy superior to Gaussian mixture models in a number of recent works. Typically, neural networks are optimized with stochastic gradient descent. On large datasets, stochastic gradient descent improves quickly during the beginning of the optimization. But since it does not make use of second order information, ...
متن کاملParallel Computation for Chromosome Reconstruction on a Cluster of Workstations
Reconstructing a physical map of a chromosome from a genomic library presents a central computational problem in genetics. Physical map reconstruction in the presence of errors is a problem of high computational complexity which provides the motivation for parallel computing. Parallelization strategies for a maximum likelihood estimation-based approach to physical map reconstruction are present...
متن کاملSimulated Annealing with Levy Distribution for Fast Matrix Factorization-Based Collaborative Filtering
Matrix factorization is one of the best approaches for collaborative filtering, because of its high accuracy in presenting users and items latent factors. The main disadvantages of matrix factorization are its complexity, and being very hard to be parallelized, specially with very large matrices. In this paper, we introduce a new method for collaborative filtering based on Matrix Factorization ...
متن کاملOptimized Online Rank Learning for Machine Translation
We present an online learning algorithm for statistical machine translation (SMT) based on stochastic gradient descent (SGD). Under the online setting of rank learning, a corpus-wise loss has to be approximated by a batch local loss when optimizing for evaluation measures that cannot be linearly decomposed into a sentence-wise loss, such as BLEU. We propose a variant of SGD with a larger batch ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010